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The irreversibility of a stationary time series can be quantified using the Kullback-Leibler di- 
vergence (KLD) between the probability to observe the series and the probability to observe the 
time-reversed series. Moreover, this KLD is a tool to estimate entropy production from stationary 
trajectories since it gives a lower bound to the entropy production of the physical process gener- 
ating the series. In this paper we introduce analytical and numerical techniques to estimate the 
KLD between time series generated by several stochastic dynamics with a finite number of states. 
We examine the accuracy of our estimators for a specific example, a discrete flashing ratchet, and 
investigate how close is the KLD to the entropy production depending on the number of degrees of 
freedom of the system that are sampled in the trajectories. 
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I. INTRODUCTION 

The relationship between irreversibility and entropy 
production is mentioned in many undergraduate courses 
of thermodynamics and statistical physics. A canonical 
example is a glass falling to the ground and smashing 
into pieces. The time-reverse of this process is compat- 
ible with Newton's laws, but the chances for it to occur 
spontaneously are incredibly small. Such a process is ir- 
reversible and the signature of this irreversibility is the 
production of a macroscopic amount of entropy in the 
universe. 

The relation between irreversibility and entropy pro- 
duction was only a qualitative statement until the recent 
introduction of the Kullback-Leibler divergence (KLD) in 
the context of fluctuation theorems [JJ E] . The time irre- 
versibility of a process is given by the distinguishability 
between the process and its time reversal, which in turn 
can be quantified using the KLD or relative entropy, a 
measure of the distinguishability between two probabil- 
ity distributions defined in information theory |2- \ ij . This 
KLD, multiplied by the Boltzmann constant, turns out 
to be a lower bound to the entropy production along the 
process [U El I5HT5] . The bound becomes more accurate 
when the observables that are used to calculate the KLD 
contain a more complete description of the state of the 
system. This result has been derived in a variety of situ- 
ations such as driven systems under Hamiltonian [JJ El IB] 
and Langevin [TJJ [TU [THl [J7] dynamics; Markovian pro- 
cesses [71 [TD]; and also for electrical circuits [Hj. An- 
drieux et al. have verified it experimentally using the 
data of the position of a Brownian particle in a moving 
optical trap [TU [TB] , and we have shown that the bound 
yields useful estimates of the entropy production in non 
equilibrium stationary states (NESS) [DJ. 

Imagine repeatedly sampling (or measuring) an observ- 
able of a system in a NESS. The trajectory of the out- 
comes is a stationary time series that can be used to es- 
timate the KLD, by comparing the statistics of the time 
series with the statistics of the same series but time re- 



versed [i9J. This means that one can bound from below 
the entropy production in the NESS from a single time 
series obtained in an experiment. Such a tool is of inter- 
est in many practical situations. For instance, it allows 
one to discriminate between active and passive processes 
in biological systems, or even to estimate or bound the 
amount of entropy produced, and therefore the amount of 
ATP consumed in a biological process. In fact, there have 
been previous attempts to make this distinction. Martin 
et al. have considered the violation of the fluctuation- 
dissipation relationship as a signature of non-equilibrium 
in the motion of a hair cell by using two types of mea- 
surement: the spontaneous motion of the hair bundle 
and the response to an external force [191 . Amman et al. 
discriminated between equilibrium and NESS in a three 
state chemical system [2D]. Finally, Kennel introduced 
in |21j criteria based on compression algorithms to dis- 
tinguish between time symmetric and time asymmetric 
chaotic series but without any connection to the physical 
entropy. 

We are interested in estimating the KLD between the 
probability of observing a stationary trajectory of one 
or several observables of the system and the probability 
of observing the same trajectory but time reversed. We 
want to explore how this quantity bounds the entropy 
production of the underlying physical process [H El E] 
depending on the number of degrees of freedom of the 
system that are sampled in the observed stationary tra- 
jectory. Two distinct issues immediately arise: the es- 
timation of the KLD from an empirical stationary time 
series and the accuracy of the bound. In this paper we ad- 
dress these two issues by introducing numerical and semi- 
analytical techniques to estimate the KLD from data ob- 
tained from systems with a finite number of states. 

There have been different attempts to provide accurate 
estimators of the KLD from a finite number of data. Ref- 
erences Ell ES] investigate how this measure can be es- 
timated when considering empirical probability distribu- 
tions of two different Markovian and higher order Marko- 
vian time series. They develop techniques based on em- 



2 



pirical counting of finite sequences of data which are gen- 
eralized to real- valued time series in Refs. [HI [24j [25] . A 
different approach is given in where the KLD be- 
tween two different probability distributions is estimated 
using compression algorithms. In this paper we refine 
these methods and test their performance when used to 
estimate the KLD from single stationary trajectories. 

To explore the bound to the entropy production, we 
work with a discrete flashing ratchet model, where we 
can compare the entropy production with the analytical 
value and the empirical estimations of the KLD. With 
this model, we can analyze how information losses affect 
the estimation of the KLD and the tightness of the bound 
for the entropy production. 

The paper is organized as follows: section [IT] reviews 
the concept of the KLD, and discusses its connection with 
entropy production. In Section III we present novel ana- 



lytical and semi-analytical tools to calculate the KLD be- 
tween hidden Markov chains. Section |TV| gives a detailed 
description of the estimators of the KLD from empirical 
data, whose performance for the flashing ratchet is ana- 
lyzed in Sec. |Vj Finally, we present our main conclusions 
in Sec. ED 



II. KULLBACK-LEIBLER DIVERGENCE, 
IRREVERSIBILITY, AND ENTROPY 
PRODUCTION 

A. The Kullback-Leibler divergence 

The Kullback-Leibler divergence, or relative entropy, 
measures the distinguishability of two probability distri- 
butions p(x) and q(x): 



D\p(x)\\q(x)] 



dxp(x) loj 



(i) 



It is always positive, and vanishes if and only if p(x) = 
q(x) for all x. Its interpretation as a measure of dis- 
tinguishability is a consequence of the Chernoff-Stein 
lemma [3] : the probability of incorrectly guessing (via hy- 
pothesis testing) that a sequence of n data is distributed 
according to p when the true distribution is q is asymp- 
totically equal to e~ nD ^ x ''' q ^ x " . Therefore, whenp and q 
are similar —in the sense that they overlap significantly— 
the likelihood of incorrectly guessing the distribution, p 
or q, is large [3]. 

Let us recall a property of the KLD that we will use 
throughout the paper [3J. If we have two random vari- 
ables X, Y and two joint probability distributions p(x, y) 
and q(x, y), then 



D[p(x,y)\\q{x,y)}> D[p(x)\\q{x)} 



(2) 



This means that it is harder to distinguish between p and 
q when we consider only the marginal distributions, p{x) 
and q(x), instead of the full joint distributions, p(x,y) 
and q(x, y). If X, Y describe the state of a system, Eq. ([2| 



indicates that the KLD decreases when only a partial 
description of the system, given by the variable X, is 
available. 



B. Irreversibility and entropy production 

Consider a physical system with Hamiltonian H(z; A), 
where z denotes a point in phase space F, and A is a 
parameter of the system controlled by an external agent. 
The system is initially isolated in equilibrium at tem- 
perature T, and the external agent modifies A following 
a protocol A t , with t € [0,t]. We then let the system 
equilibrate by coupling it to a bath at temperature T". 
The initial and final states of this process are equilibrium 
states for which entropy is well defined. We denote by 
p(z,t) the probability density on phase space at time t, 
and by p(z, t) the probability density when the system 
is driven by the time-reversed protocol At = \ T -t with 
t €E [0, t]. Here z denotes the point in phase space re- 
sulting from changing the sign of all momenta in z. In 
Ref. [2] it is proved that the change of the entropy AS 
in the system plus the bath, averaged over many realiza- 
tions of the process, satisfies 



(AS)=kD[p(z,t)\\p(S,r-t)}, 



(3) 



where k is Boltzmann's constant. Equation ([3]) is valid for 
a variety of initial equilibrium conditions [2J: canonical, 
multi-canonical (several uncoupled systems at different 
temperatures), and grand-canonical distributions, as well 
as for different types of baths equilibrating the system 
at the end of the process. In particular, for canonical 
initial conditions in the forward and in the time-reversed 
processes, both at the same temperature T, Eq. ^ reads 
(see Ref. 0) 

(AS) = (AS systcm ) + (ASbath) 
(AE) - AF (Q) 
T 



T 

(W) - AF 



(4) 



where (AE) and AF refer respectively to the system av- 
erage energy and free energy change, Q is the heat ex- 
changed with the thermal bath at the end of the process 
(realization dependent), and W = AE + Q is the work 
performed by the external agent. Therefore, in this spe- 
cific case, entropy production equals the average dissi- 
pated work (Wdiss) = (W) — AF divided by the temper- 
ature T and ^ becomes 



(W dh 



kTD[p(z,t)\\p(z,r-t)]. 



(5) 



Since the evolution is deterministic, except for the last 
stage where the system is connected to the bath, the 
point z at time t determines the whole trajectory of the 
system {z(£)}£_ . Then z(t) and {z(t)}J =0 carry the 
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same information and the KLD of their respective PDF's 
are equal. Equation (JsJ can be rewritten in terms of path 
probabilities V [TB] 

(^ diss ) = kTD[V({z(t)}U)\\V({z(r-t)}U)\. (6) 
On the other hand, integrating Crook's relationship [27] , 



W — AF = loe 



p(w) 



p^ W ) , where p(W) \p(W)] is the proba- 
bility density of the work done on the system along the 
actual (time-reversed) process [TBI 127J > one immediately 
gets 



(W diss ) = kTD[ P (W)\\p(-W)}. 



(7) 



Notice that the work W is a function of the trajectory 
{z(t)}t =0 containing much less information than the tra- 
jectory itself. As indicated by Eq. (J2J, the KLD of work 
distributions should in principle be smaller than the KLD 
of trajectory distributions. On the contrary, the KLD is 
the same, indicating that all the irreversibility of the pro- 
cess is captured by the dissipative work [TB]. 



C. Stationary trajectories 

We now proceed to apply the above results to station- 
ary trajectories. Consider a long process in which the sys- 
tem reaches a non-equilibrium stationary state (NESS) 
after a possible initial transient. In the NESS the exter- 
nal parameter is held fixed, Xt = A; the system is kept 
out of equilibrium due to the existence of baths at dif- 
ferent temperatures (a possibility that is included in the 
hypothesis used in [2] to prove ([3])) or different chemical 
potentials, external constant forces, etc. In the steady 
state, since the control parameter remains fixed, the pro- 
tocol and its time reversal are identical At = At = A [T5] , 
Therefore the probability distributions of the process and 
its time reversal are identical, V = V. In the long time 
limit, t — y oc, we can neglect the contribution of the tran- 
sient to the entropy production and rewrite ^ for the 
entropy production per unit of time S in the NESS [28] 
as 



(S) = lim -D [V ({z(t)V t=0 )\ \V {{z{r - t)Y t=0 )\ . (8) 

T— >oo T 

A similar expression can be obtained from the Gallavotti- 



Cohcn theorem [HI [H], AS ~ Hog 



, where 



Pr(-AS)' 

p T (AS) is the probability to observe an entropy produc- 
tion AS in the interval [0,r]. The Gallavotti-Cohen re- 
lationship, which is exact for t — > oo, yields, after aver- 
aging 



(S) = lim -D[ Pt (AS)\\ Pt (-AS)} 



(9) 



Consequently, although AS is another observable that is 
obtained as a function of the microstate of the system, 



the KLD calculated with AS yields the same value as 
the one calculated with full information of the system. 
Therefore entropy production captures all the informa- 
tion about the time irreversibility of the NESS. 

When one does not observe the entire microscopic tra- 
jectory {z(t)}t =0 in Q but the trajectory followed by one 
or several observables of the system x(t), the KLD only 
provides a lower bound to the entropy production |31| . 
Equations |7| and ^ indicate that the equality is re- 
covered if the observables determine in a unique way the 
entropy production or the dissipated work. 

In an experimental context, the observables are usually 
sampled at a finite frequency. The output is then a time 
series of data or discrete trajectory, x = (xi,X2, ■ ■ ■ , x n ), 
where Xj can be the value of a single or several observ- 
ables of the system. In this case, we are interested in 
estimating the entropy production per data of the under- 
lying physical process, which we denote by (S) in the rest 
of the paper. Entropy production per data is related to 
the KLD rate per data, which we define below. 

Given an infinitely long realization or time series sam- 
pled from a random process (i = 1, 2, . . . ), which can 
be multi-dimensional, we define by p(x™) the probabil- 
ity that a given string of m consecutive data is equal to 
x" 1 = (xi, X2, • • • , x m ). We define the m— th order KLD 
for this random process Xj by the distinguishability be- 
tween p(x™) and the probability p{x^ n ) to observe the 
reverse sequence of data x^ — (x m ,x m -i, • • ■ ,x\). 



DX=D\p( X ?)\\p(x 1 m )]= pK")iog44r 

Xi,-,x m P\ X rn) 

(10) 

The KLD rate for the process Xj is defined as the growth 
rate of D^. with the number of data, 



n x 

d x = lim ^ 

m— too m 



(11) 



By virtue of ^ and |2| , this quantity bounds from below 
the entropy production per data 



(S) > kd x , 



(12) 



where the bound is saturated if the random variable is the 
microstate of the system X — {q, p} and the sampling 
rate is infinite [31] or X determines uniquely the entropy 
production in the process. 

Equation (12) is our basic result. It reveals a striking 



connection between physics and the statistics of a time 
series. The left-hand side, (S), is a purely physical quan- 
tity, whereas the right-hand side, d x , is a statistical mag- 
nitude depending solely on the observed data, but not on 
the physical mechanism generating the data. Such a con- 
nection generalizes Landauer's principle relating entropy 
production and logical irreversibility in computing ma- 
chines [TJ[32j[33]. Equation (12) extends this principle 
and suggests that we can determine the average dissipa- 
tion of an arbitrary NESS, even ignoring any physical 
detail of the system. 
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D. 



Markovian trajectories obeying local detailed 
balance 



We first analyze how the bound ( 12 ) is expressed for 



Markovian time series that obey detailed balance by de- 
riving analytical expressions for both entropy produc- 
tion and the KLD rate. If the random process Xi is 
Markovian, the probability distribution p(x™) factorizes 
p(x™) = p(x\)p(x 2 \xi) ■ ■ ■ p(x m \x m -i), which also holds 
if we reverse the arguments, i.e., for p(x} n ). Substituting 
these expressions into equation ( |Tl"] ), we get 



d x = 



Xi ,X 2 



p(xi,x 2 )lot 



p(x 2 \xi) 
' p(xi\x 2 ) 



= D X -D x = D X , 



(13) 

since D x = when comparing a trajectory and its re- 
verse. Therefore, d x only depends on transition proba- 
bilities if X is a random Markovian process. 

We now relate d x in Eq. ( 13 1 with the entropy pro- 



duction when the system reaches a NESS, because it is 
in contact with several thermal baths. In this situation, 
the local detailed balance condition is satisfied. We call 
V{x,i) is the energy of the state Xi, and T XltX2 is the 
temperature of the bath that activates the transitions 
x\ — > x 2 and x 2 — > x\. The local detailed balance condi- 
tion reads in this case 



p{x 2 \xi) 
p{xi\x 2 ) 



exp 



h T 

™ X\ ,X2 



(14) 



Inserting ( 14 ) into ( 13 1 



V{xx)-V{x 2 ) 



Qx 1: x 2 _ {&} 



(15) 



where Q Xl ,x 2 — V(xx) — V(x 2 ) is the heat dissipated to 
the corresponding thermal bath in the jump x\ — > x 2 , 
and S is the total entropy production per data. There- 



fore, Eq. ( 12 ) is reproduced, with equality, in the case 
of a physical system obeying local detailed balance, if 
we have access to all the variables describing the system. 
The same conclusion is reached if we induce the NESS 
by means of non-conservative constant forces. 

Equation (13) can be explored further by defining the 



current from the state x\ to the state x 2 as the net prob- 
ability flow from x\ to x 2 , J Xl _> X2 = p(x±, x 2 ) —p{x 2 , x\). 
If the system is not far from equilibrium the current 
tends to zero, and the following condition is satisfied 



<^p(xt,x 2 ), yielding 



(5) 
k 



= d x =D, 



x 



E 



2p(x ll x 2 ) 



(16) 



by the product of a flow times a thermodynamic force 
that is proportional to the flow itself. Equation ( 16 1 im- 



plies that the time asymmetry of a Markovian process 
not far from equilibrium is revealed by the currents or 
probability flows that can be observed. In other words, a 
Markovian process without flows is time reversible. This 
is not the case for non-Markovian time series, where ir- 
reversibility can show up even in the absence of currents 
(see below and [5]). 



III. KULLBACK-LEIBLER DIVERGENCE 
BETWEEN HIDDEN MARKOV CHAINS 

In many experimental situations, a physical process 
is Markovian at a micro- or mesoscopic level of descrip- 
tion, but the observed time series only contain a subset 
of the relevant observables, being non-Markovian in gen- 
eral. This is the case in biological systems, where one 
can only register the behavior of some mechanical and 
maybe a few chemical variables, while most of the rele- 
vant chemical variables cannot be monitored. These kind 
of non-Markovian time series obtained from an underly- 
ing Markov process are called Hidden Markov chains [35j . 

In this section we derive a semi-analytical technique to 
calculate the KLD rate between hidden Markov chains. 
We focus on a simple case where the underlying Markov 
process is described by two observables X and Y; how- 
ever we only observe X whose evolution is described by a 
hidden Markov chain. The KLD rate for the observable 
X is 



hm log 2/1 fT . 



(17) 



It is convenient to write d x as a difference between two 
terms, d x — h x — h x , where 

h x = - hm 1 WxniogVp(*i\2/n, (18) 

m— Vnn m £ — ^ ^ — ^ 



is called Shannon entropy rate, and 



hm -]>>«) log ^r P (xL^), (19) 



cross entropy rate. Since the underlying process is 
Markovian, p{x™,y™) factorizes and both Shannon and 
cross entropy can be expressed in terms of the trace of a 
product of random transition matrices T [3S1 |3Z] • These 
are square MxM random matrices, where M is the num- 
ber of values that the variable y can take on, and their 
entries are given by 



T(xi,x 2 ) Vxy2 =p(x 2 ,y 2 \xi,yx). 



(20) 



This expression is well known from linear irreversible 
thermodynamics [34] . where entropy production is given 



Note the different role played by each variable in this for- 
malism: parameters defining the matrix (making 
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T a random matrix), whereas yi are subindices of the 
matrix elements. The Shannon and cross entropy can be 
expressed in terms of these matrices, 



h x = - lim — ( log Tr 



Y\ T(xi,x i+ i) 



i=l 



(21) 



lim — ( log Tr 

m— »oo rn \ 



j| ^ T(^jn— z+l 5 *^m — i ) 



(22) 

where (•) denotes the average over the random process 
Xi, which are weighted by p(x™). For sufficiently large 
m, Eqs. 



21 1 and (22 1 are self-averaging [37] . meaning 



that we do not need to calculate the average but just 
compute the trace for a single stationary trajectory. For 
any sufficiently long time series x = (pb\, &2, • ■ • ,x n ) with 
n large, the following expressions converge to —h and — h r 
almost surely, 



A x = - log 

n 



n-l 



\ [ T(xi,x i+1 ) 



h x , (23) 



A x = - log 
n 



»=l 



.x 



(24) 



where |j • || is any matrix norm that satisfies ||A • B|| < 
||A|| ||B|| 37] . In particular, the trace satisfies this condi- 
tion for positive matrices. In the context of random ma- 
trix theory, A x and A x are known as maximum Lyapunov 
characteristic exponents [381 and measure the asymptotic 
rate of growth of a random vector when being multiplied 
by a random sequence of matrices. In practice, we can 
estimate d x semi-analytically as 



A x -A > 



(25) 



Here A x and A x are estimated using (23) and (24 1 with 
a single time series x of size n, following a technique 
introduced in Rcf. [38 : we generate a random station- 
ary time series x = {x™} and compute the matrices T 
analytically; then a random unitary vector is multiplied 
by those matrices and normalized every I data, keeping 
track of the normalization factor; finally the product of 
these factors divided by n yields A . For A x , the same 
procedure is repeated but using the reversed time series 
x = The technique is semi-analytical since the 

transition probabilities are known analytically but a sin- 
gle random stationary time series x is necessary to esti- 
mate d x with the multiplication of n transition matrices 
that are chosen according to x. 

Let us recall that the estimator d x cannot be applied to 
empirical time series unless we know the Markov model 
behind the data. Consequently, it is not useful in practi- 
cal situations. However, we will use it to check the per- 
formance of the estimators introduced in the following 



section, which only need a single stationary time series 
to estimate the KLD and do not assume any knowledge of 
the dynamics generating these data. On the other hand, 
one can also get analytical approximations of Eqs. (21) 



and ( 22 ) by using the replica trick, in an analogous way 

The calculation is cum- 



as it has been done in Ref. [39 
bersome and is explained in Appendix [XJ Both the semi- 
analytical and the replica calculations are used in Sec. [V] 
to check the accuracy of several empirical estimators of 
the KLD. 



IV. ESTIMATING KLD RATES FROM SINGLE 
STATIONARY TRAJECTORIES 

In previous sections, we calculated the KLD analyti- 
cally (or semi-analytically) for series where we know in 
advance the dynamics of the underlying physical process. 
We now investigate how the KLD rate can be estimated 
from a single empirical stationary trajectory, obtained 
from a discrete stochastic process whose dynamics is un- 
known. We call Xi the value of the i— th data of an empiri- 
cal trajectory of n data, which is denoted by x = {ii}™ =1 . 
There are two types of estimators in the literature: plug- 
in estimators, based on empirical counting of sequences 
of data, and estimators based on compression algorithms. 
In this section, we introduce a refinement of the these two 
methods and analyse their performance for a specific ex- 
ample in Sec. [V] 



A. Plug-in estimators 

The simplest approach to estimate the KLD rate is 
known as the plug-in method |24j . which consists of an 
empirical estimation of the probabilities of sequences of 
m data, p{x™), appearing in Eq. (10). The probability 



to observe the sequence x™, p(x™), is estimated empiri- 
cally from simply counting the number of times that x™ 
appears in a single stationary trajectory x = (x\, . . . , x n ) 
of size n. The empirical probability distribution is 



-(m-X) 



F(x?) 



71—1) ^— ' p 



n — (m — 1) 



P =i 



(26) 

Then an estimate of is obtained by plugging the em- 
pirical probability distribution into Eq. (10): 



Dl = D\p*(x?)\\p*(x 1 J]= ]T p x «)log 



p x (x?) 
(27) 

Note that the probabilities in Eq. ( 27 ) include the super- 



script x to emphasize that they are obtained empirically 
from a single stationary time series x and therefore de- 
pend on each particular realization. The simplest way 



(> 



estimate d x would be by taking ^f- for m as large as 
possible. However, this naive approach is not efficient. 
The empirical probability p x (x™) — and therefore l) x — 
is less accurate as to increases, because the number of 
possible substring a;™ increases exponentially and the 
statistics shortly becomes poor. It is convenient to find 
alternative expressions with a fast convergence. It turns 
out that the slope of D* as a function of to, 



d x = D x — n x 

^m — li 



(28) 



Oi 



(29) 



For a Markovian time series, as shown in Eq. (|13j), the 
limit is reached for to = 2, and using distributions of 
three or more data we only get redundant information: 



also converges to the KLD rate but faster than 
plug-in estimator will be constructed as the limit 

tf* = lim d* . 



d^, for any to > 2. 



Therefore, eP 



d x is 



an excellent estimator of the KLD, d x . If x is a k-th 
order Markov chain (i.e., it is Markovian when consid- 
ering blocks of k data {ij}), then the limit is reached 



for to 



k. 



i k + l 



L k+2 



The convergence of ( 29 ) is then expected to be fast if a 
time series can be approximated by a fc-th order Markov 
chain. 

If the trajectory x is sampled from a general non- 
Markovian process, one needs further information to ex- 
trapolate d^ for to —} oo, specially when only moderate 
values of to can be reached. In the examples discussed be- 
low, we have found that convergence is well described by 
the following ansatz, proposed by Schiirmann and Grass- 
berger |40j to estimate Shannon entropy rate 



log™ 

mi 



(30) 



Here c and 7 are parameters that, together with d^, 
can be obtained by fitting the empirical values of d^ 
as a function of to. The fitting parameter d^ gives an 
estimation of the limit (29). 



This estimation method is efficient as long as there is 
sufficient statistics in the data, that is, if for every se- 
ries that occurs in the trajectory, its reverse x^ is 
observed at least once. On the other hand, if we find 
empirically j5 x (x™) 7^ while ^(x^) = for at least one 
case, the argument of the logarithm in Eq. ( 10 1 diverges, 
yielding d* n = 00. We can avoid this divergence by re- 
stricting the sum in Df n to sequences x™ whose reverse 



occur in the time scries: 



E 



(31) 



where (zf)* = {x™ | ^(ajf) ^ and p x (x^) ^ 0}. 
With this restriction, a lower bound to D* n is always 
obtained, Df* < D*. 



A different strategy is to artificially bias the empirical 
probabilities such that all of them become positive. In- 
stead of the observed empirical frequencies, we can use 
the following biased frequencies [JT] 



p*(x?) 



71 X (X™) +7 



(32) 



Here n x (x™) is the number of observations of x™ in x and 
7 is the bias, which is a small number that prevents any 
of the probabilities to be zero, assigning a probability 
of order 7/71 to sequences that are not observed. The 
denominator in Eq. (32 1 ensures normalization of p x (x™). 



B. Ziv-Merhav estimator 

Ziv and Merhav introduced in Ref. [26] an estimator 
of the KLD rate between two probability distributions 
based on compression algorithms. It consists on slicing or 
parsing stationary discrete time series into smaller parts 
according to a specific algorithm. The slicing produces 
a sequence of numbers (often called a dictionary) that 
contains the same data than the original series, but it 
is divided into subsequences, called phrases. The algo- 
rithms that are used are called compression algorithms 
because the number of phrases in which a time series x 
of n numbers is parsed into is smaller than n. 

The estimator is defined in terms of two concepts 
which are now described, the compression length of a 
sequence and the cross parsing length between two dif- 
ferent sequences. Given a series x = x™, its compres- 
sion length c(x") is defined as the number of distinct 
phrases in which it is parsed using the Lempel-Ziv (LZ) 
algorithm [42] . The LZ algorithm parses a series sequen- 
tially, such that each phrase that is added to the dictio- 
nary is the shortest distinct phrase that is not already 
in the dictionary. For example, let us consider the series 
x = xl 1 = (0,1,1,1,1,0,0,0,1,1,0). The LZ sequential 
parsing for this example is as follows: First we store the 
first element of the sequence x\ = in the dictionary as it 
is empty, hence Diet = {0}. Then we read the next num- 
ber, X2 = 1, which is not already in the dictionary, so x^ 
is added to the dictionary, Diet = {0|1}. The next num- 
ber in x\ x is = 1, which is already in the dictionary. 
Then we append to X3 the next number of the sequence, 
x^ = (1,1). This phrase is not in the dictionary and 
therefore it is parsed, Diet = {0|1|(1, 1)}. By doing this 
for all the series a;} 1 , we obtain the following dictionary 
of phrases Diet = {0|1|(1, 1)|(1, 0)|(0, 0)|(1, 1, 0)}. The 
compression length is the number of phrases that the dic- 
tionary contains once the series x is completely parsed, 
c{x\^) = 6 in this example. The compression length of a 
stationary time series is related to its Shannon entropy 
rate [5] in the limit of infinitely long sequences: 



lim 



c(x™) logc(x™) 



(33) 



However, as d x — h x — h x , we also require an esti- 
mator for h x in order to determine d x . This is given 
in terms of another quantity called cross parsing length. 
The cross parsing of a series x™ with respect to an- 
other sequence z™ is obtained by parsing a;™ looking for 
the longest phrase that appears anywhere in z™. As 
an example, let us consider the cross parsing of x = 
= (0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0) with respect to another 



x 



li 



sequence z 



ai 



(1,0,0,1,0,1,0,0,1,1,0). The first 
number in x is x± = 0, which is in z. Therefore we append 
to Xi the next number in x, x\ = (0, 1). This sequence is 
also somewhere in z, more precisely it is equal to z| ,zf 
and Zg, so we append the next item in x, x\ = (0, 1, 1). 



Again this sequence is somewhere in z, xf 



3 _ ,10 



and it 



is added to the dictionary, Diet = {(0, 1, 1)} because x\ 
is not equal to any subsequence of z\ . We repeat this 
procedure again starting from x^ and the resulting dic- 
tionary is: Diet = {(0,1,1)|(1,1,0)|(Q,0,1,1,0)}. The 
cross parsing length is the number of parsed sequences, 
which in this example is equal to c r (x^ \z\ 1 ) — 3. In 
Ref. [26] it is proved that the following quantity tends to 
the KLD rate between the probability distributions that 



generated the sequences x = 
call p x and q z respectively, 



and 



z™, which we 




2 xlO'"' 4 XlO 5 6 xlO'"' 8 xlO'"' 1 xlO 6 

n 



lim -[c r K|zniogn-cK)logc(x?)] = d{p x \\q z ). 

n— too ft 

We can estimate d x by using as inputs in the left-hand 
side of the above equation a stationary time series and 
its time reverse. The Ziv-Merhav estimator of d when 
using a time series x of n data is introduced as follows 



l ZM 



n 



c(x?)logcK)] 



(35) 



which converges to d x when n oo, although the con- 
vergence is slow [2B] . This estimator has been used as a 
measure of distinguishability in several fields such as au- 
thorship attribution [22] or biometric identification [43] . 

When the KLD rate between the probability distribu- 
tions under consideration is small (d x < 1), the estima- 
tion given by Eq. (34) can be even negative [22 . The 



estimator gives negative values in some cases because it 
mixes two types of parsing: the sequential parsing of the 
trajectory and the cross parsing, which is not sequential. 
We propose the following correction, which helps to solve 
this issue and improves the performance of the estima- 
tor. We first evaluate ( 35 ) between different segments of 



the same trajectory. More precisely, we split x into two 
equal parts and apply the original estimator ( 34 ) 



Cr{x n nl2 \x'l 12 ) fog § - C« /2 ) fog C(x n nj2 ) 

n/2 



a ZM — 



(36) 



n/2 



If the time series is stationary, the two fragments, x Y ' 
and £™/ 2 , are equivalent and d^ M should vanish. How- 
ever it is usually negative for finite n and exhibits a slow 



FIG. 1: Sketch of the 3-state toy mod el u sed to check the 
accuracy of our compression estimator (37 1 and comparison 



between different compression estimators and the analytical 
value of d x . The analytical value of d x for a model with 
a = 0.5, /9 = 0.7,7 = 0.6 (d x = d x = 0.08278) is indicated 
by the solid black line in the plot. We show the value of 
the compression estimators obtained from a single stationary 
time series x" as a function of the length n: the Ziv-Merhav 
estimator d^ M (red dashed line), the bias d^ M (red dotted 
line) and our estimator (red squares). 



convergence to zero for large n |22j . Then, we define our 
estimator as 



L ZM 



(37) 



which still converges to d when n — ¥ oo and yields much 
better results for finite n, as we show with a simple ex- 
ample. 

We perform a first validation of this estimator using 
the three-state model illustrated in Fig.[T] Trajectories of 
the model are lists of numbers, 0, 1 or 2, representing the 
three states of the system. The dynamics is Markovian 
with transition probabilities given by po^i = 1 — Pi-s-o = 
a, Pi^2 = 1 - P2^i = P and p 2 ^o = 1 - Po^2 = 7- 
We call Xi the stochastic process describing the state of 
the system and x a particular stationary time series, e.g. 
x = (0, 2, 1, 0, 1, 2, 1, 2, ■ • • ). This time series is reversible 
only when the three transition probabilities satisfy the 
Kolmogorov condition 03], af3j = (1 — a)(l — /?)(1 — 7). 
In Fig. [I] (lower plot) we compare the value of different 
compression estimators with the analytical value of d x 
as a function of the length of the empirical trajectory 
n. Since the trajectories described by the state of the 
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FIG. 2: Illustration of our discrete ratchet model. Particles 
are immersed in a thermal bath at temperature T and move 
in one dimension in an asymmetric linear potential V\(x) of 
height 2V with periodic boundary conditions. The potential 
is switched on and off at a rate r, where Vq(x) = represents 
a flat potential, and the switching probability does not de- 
pend on the position of the particle. The state of the particle 
is represented by two random variables (X, Y) indicated in 
the figure, where X = {0, 1, 2} stands for the position of the 
particle whereas Y = {0, 1} for the state of the potential. Us- 
ing this description, the system can be in six different states, 
(0,0), (1,0), (2,0), (0,1), (1,1), (2,1). 



system are Markovian, 
probabilities: d x 



d x 



d x only depends on transition 
. We see that the Ziv-Merhav 



estimator d^ M fails to estimate d x accurately when it is 
small (d x ~ 0.083) and in some cases gives a negative 
value. The proposed estimator d*, on the other hand, 
is significantly closer to the analytical result, although 
slightly overestimates its true value. 



V. APPLICATION: THE DISCRETE FLASHING 
RATCHET 

A. The model 

We now apply the previous techniques to a specific ex- 
ample: a discrete flashing ratchet consisting of a Brown- 
ian particle moving on a one dimensional lattice |45j . The 
particle is immersed in a thermal bath at temperature T 
and moves in a periodic, linear, asymmetric potential of 
height 2V, which is switched on and off at a constant rate 
r (see Fig. [2]). Trajectories are denoted by two random 
observables: the position of the particle X (0, 1 or 2) and 
the state of the potential Y (ON, Y = 1 or OFF, Y = 0). 

The particle evolves in continuous time according to a 
Master equation. The dynamics is described in terms of 
rates of spatial jumps and switching. For each possible 
transition except switches, i.e. (xi,|/i) — > (2:2,2/2) with 
j/i=2/2= y, we define a transition rate fc(xi,y)-Kx2,y) 
obeying detailed balance, 



y(x 2 ,y) = exp 



V y (x 2 ) ~ V V (X X ) 



2kT 



When the potential is on (y — 1), the value of the poten- 
tial energy V\{x) is given in Fig. [2] When the potential 
is off, Vq(x) = for all x, and fc( ai ,o)->(x2,o) = 1 f° r 
x\ =/= X2- The switching rate does not depend on the po- 
sition of the particle: k(x,yi)-t(x,y 2 ) — r for any value of x 
and j/i 7^ J/2) and consequently violates detailed balance, 
driving the system out of equilibrium. 

We simplify the analysis by mapping the dynamics 
onto a discrete-time process, a Markov chain. To this 
end, we record in a time series (x,y) = {x™, y™} just 
a list of the visited states, discarding any information 
about the time where jumps and switches occur. The 
resulting Markov chain is denned by the transition prob- 
abilities 



MO^, 2/2)1(2:1, 2/1)] = 



i (xi,yi)^(x 2 ,y 2 ) 



V k 

L-IX 2 ,y 2 



2,1/2 "'(siiViJ-K^.ite) 



(39) 



Since we discard any information about the transition 
times, we will focus along the rest of paper only on dis- 
sipation and KLD rates per jump or per data. For finite 
switching rate r, the ratchet rectifies the thermal fluctu- 
ations inducing a current to the left in Fig. [2] 1331 US]- 
The system obeys a local detailed balance condition, as 
described in Sec. |IID| The nonequilibrium nature of the 
switching can be interpreted in two alternative ways: one 
can imagine that it is activated by a thermal bath at in- 
finite temperature or by an external agent |34j . In either 
of the two interpretations, switching does not induce any 
entropy production (the bath needs an infinite amount of 
energy to change its entropy and the external agent does 
not produce any entropy change). Therefore, entropy is 
only produced when heat is dissipated to the bath at 
temperature T, which only occurs when the potential is 
on. The average entropy production (or dissipation) per 
data in the time series is then [cfr. (p~5j)] 



<*>= E 



E 

y=0,l X!,X2=0,l,2 



p[(xi,2/); (x 2 ,y)] 



VyjXj) - V V (X 2 ) 

T 



(38) 



(40) 

which is equal to the KLD rate when calculated for 
time series containing the information of both position 
and state of the system (which we call full information), 
(S) = d X ' Y = d 2 ' • We now analyze how can d be esti- 
mated using single stationary trajectories of this model, 
and how close is this estimation to the entropy produc- 
tion depending on the number of degrees of freedom of 
the system that are sampled in the time series. 



B. Full information 

Firstly, we investigate the estimation of the KLD rate 
when using full information of the system (the position of 
the particle X and the state of the potential Y), and how 
close is this KLD rate to the actual entropy production of 
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FIG. 3: Analytical value of the average dissipation per data 
in units of kT (black line) as a function of /3V in the flashing 
ratchet (r = 1) and different estimators of d x ' Y . For each 
value of f5V, estimators are obtained from a single stationary 
time series of n = 10 6 data containing full information of the 
system (position, X, and state of the potential, Y): Plug-in 
estimators: d^' y (blue circles), dg' y (green diamonds), and 
dj' y using biased probabilities with 7 = 1 (blue open circles). 



sudden drops in y and d*' y are a consequence of lack 
of statistics in the trajectory. For the specific time series 
used in Fig. [3j the lack of statistics starts at (3V ~ 10 
for d*' y and arises earlier for d 3 ' y because the three-data 
sampling space is bigger and it is easier that some tran- 
sitions (xi, yi) — > (^2, 2/2) — > {%3, 2/3) do not appear while 
their reverse do. 

A more efficient way of dealing with the missing se- 
quences is incorporating a small bias to the empirical 
probabilities, as described in Eq. (32 1. This is equivalent 



to assigning a probability of order 1/n to those tran- 
sitions that are not observed in a time series of n data. 
Figure [3] shows d^' y with a bias 7 = 1 (blue open circles), 
which is able to extend the accuracy of the estimation 
even when there is lack of statistics. 

Although in the case of Markovian series with a fi- 
nite number of states the most convenient strategy is to 
use the plug-in estimator, we include for comparison the 
compression estimator d^ ,y (red squares) which gives ac- 
curate values of the dissipation for weak potentials. Fur- 
thermore, the compression estimator is better than some 
plug-in estimators even for strong potentials, since it does 
not exhibit sudden jumps due to lack of statistics. 



Compression estimator. <f*' y (red squares). 



the process. In Fig. [3] we compare the actual dissipation 
and several empirical estimations of d ,Y for different 
values of the height of the potential, V. For each value 
of V we simulate a single stationary time series of n = 
10 6 data that contains full information, and calculate the 
plug-in estimators d 2 , d 3 , as well as the compression- 
based estimator d^' y . 

Since trajectories containing full information are 
Markovian, the plug-in estimator immediately converges 
to the dissipation d* ' y = d x,y = d X Y = (S) jk if there is 
enough statistics, which happens when V is below or of 
order kT. If V ^> kT, the observation of the uphill jumps 
such as (0, 1) ->■ (1, 1), (0, 1) -» (2, 1), or (1, 1) -> (2, 1) 
is very unlikely in a single stationary trajectory. A time 
series of n data captures the statistics of jumps with 
probability well above 1/n, which amounts to say en- 
ergy jumps below fcTlogn, (fcTloglO 6 ss lAkT for the 
trajectory used in the figures). 

If, for instance, the transition (0, 1) — > (1, 1) is miss- 
ing in the trajectory, there is no way of estimating 
p[(0, 1); (1, 1)] which contributes to two terms in d*' y [see 



Eq. (10) for n = 2]. One of these two terms accounts for 
jumps (0, 1) — > (1, 1), which are very unlikely and their 
contribution to the total dissipation rate is negligible, and 
the other term accounts for jumps (1, 1) —> (0, 1), whose 
probability is larger and therefore contribute more signif- 
icantly to the entropy production. 

In Fig. [3J d£' y (blue circles) and d*' y (green diamonds) 
have been calculated restricting the average to sequences 
(of two or three data respectively) whose reverse are also 
observed in the time series, as given by Eq. (31). The 



C. Partial information 

We now analyze the performance of our estimators 
when there is not access to the full description of the sys- 
tem. As in [9], we assume that only the position of the 
ratchet X is observable. Accordingly, we simulate tra- 
jectories containing full information, and we remove the 
information of the state afterwards, (x, y) — > x. The re- 
sulting time series x = {a;™} is not Markovian and hence 
the limit (29) is not reached for small values of m. In 
this case, we proceed by obtaining df n for m as large as 



possible and fit the resulting values to the ansatz ( 30 ) 



We have generated trajectories of size n = 10 7 for val- 
ues of V that range from to 2kT. Once we remove 
the information of the state of the potential from these 
time series, we are able to estimate d^ up to to = 9 
with no lack of statistics. Figure [4] shows the plug-in 
estimators df n for to = 2,3,5,7,9 and the extrapola- 
tion (orange pentagons connected by a dashed line to 
guide the eye) resulting from the fit to the ansatz (30). 
For each value of j3V, we fit d% tl as a function of to for 
to = 2, 3, • • • ,9 to Eq. (30) using the curve fitting tool 
available in MATLAB, which provides a robust least- 
squares fit with bisquare weights as described in |46j . The 
fit itself for a particular value of the potential, (3V = 1, is 
shown in the inset of Fig. [4] Our ansatz reproduces the 
dependence of df n with to but the final estimator d^ still 
bounds significantly from below the actual dissipation 
(black solid line in Fig. [4J. Nevertheless, plug-in estima- 
tors clearly distinguish between equilibrium and NESS, 
even with partial information. In equilibrium (V = 0), 
the trajectories are reversible and all the estimators van- 
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FIG. 4: Average dissipation per data (black line) and plug-in 
estimators of d x using partial information given by the posi- 
tion (X) for a discrete flashing ratchet with r — 1. For each 
value of PV, we calculate estimators from a single stationary 
time series of n = 10 7 data containing partial information: 
d* (blue circles), df (green diamonds), d* (purple stars), d* 
(yellow triangles), dg (cyan hexagons) and the result from the 
fit d£o (orange pentagons with error bars and connected by a 
dashed line). Inset: d^ as a function of 1/m for m = 1, • • • ,9 
for PV = 1 (open black circles) and the fit to the ansatz 
(orange line). The y— intercept of the fit is indicated by an 
orange cross and it is equal to d^. 



ish, = for m = 2, • • • ,9, whereas for the NESS 
(V > 0) they detect the irreversibility of the process 
yielding d^ > for all m. This is illustrated in Fig. [i] 
where we plot the dependence of the plug-in estimators 
with the size of the trajectory. For f3V = 0, , d* and d* 
tend to zero when increasing the number of data whereas 
they saturate to a positive value in the NESS (/3V = 1). 

There are two possible origins for the discrepancy be- 
tween and the dissipation: either (i) our fit under- 
estimates the actual KLD rate d x of the trajectory; or 



(ii) the bound (12) is not tight. To address this question 
we need to calculate the actual value of d x . Since the 
position of the ratchet x is a hidden Markov chain, we 
can calculate its KLD rate d x semi-analytically, using 
the Lyapunov exponents ( 23|24 1 introduced in Sec. Ill 



In Fig. [6] we show the value of the semi- analytical cal- 
culation of d x using the norm of transition matrices, 
Eq. ( [25] ), which is not significantly different to the em- 
pirical estimation d^. We therefore conclude that d^ is 
a good estimation of d x , but still d x only yields a lower 
bound to dissipation whose accuracy is in principle hard 
to determine. This is an expected result, since the posi- 
tion of a particle in a flashing ratchet does not obey the 
Gallavotti-Cohen theorem |47j . 

Summarizing, although d^ turns out to be a good es- 
timator of d , using only information of the position we 
only get a lower bound to the dissipation. We also show 
in Fig. [(j] the value of d*, which is well below the plug-in 
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FIG. 5: Scaling of plug-in estimators of d x , d^,, with the 
size of the time series n, for a flashing ratchet (r = 1), for 
pV = (left) and /3V = 1 (right): d\ (blue circles), d% (green 
diamonds) and dj (purple stars). We simulate a single sta- 
tionary trajectory x of 10 7 data and calculate the estimators 
for subsequences containing the first n data of x . 




FIG. 6: Average dissipation per data (black line) and different 
estimators of d x for a flashing ratchet described with partial 
information [r = 1, n = 10 7 data) as a function of J3V: d^o 
(orange dashed pentagons) d* (red squares), replica estima- 
tion of d x (green dotted line) and semi-analytical value of d x 
(yellow crosses). Inset: Dependence of the average dissipa- 
tion (black line), d* (analytical values in blue dashed line), 
d* and d^ on /3V in the vicinity of j3V = 0. 



estimator d^. The compression estimator d* lies be- 
tween df and dg (not shown in the plot), indicating that 
it is only able to capture correlations up to size 8. For 
completeness, we include the calculation of d x based on 
the replica trick (see appendix[Al). It yields a tight bound 
for V < kT, but departs from d* for larger values of V. 
This deviation is caused by the estimation of the limits in 
Eqs. ( |A10|A13| , where we take a — »• when a is defined 
only for integer values, one of the standard drawbacks of 
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the replica trick [35]. 

Although our estimators give low values of the dissi- 
pation when using partial information, they still capture 
the asymptotic behavior for V small. Entropy production 
decreases as V 2 when V — > 0, so do plug-in estimators 



-*3 ' 



, dg, d^, and the compression estimator d*. Some 



of them are plotted in the inset of Fig. [6] (inset). On 
the other hand, d* oc V 6 , since the current is J oc V 3 
in this case [see Eq. ( 16 )] . Recall that calculating d% 



is equivalent to estimating the entropy production using 
currents and standard linear irreversible thermodynam- 
ics, as shown in Eq. (16). It is then remarkable that the 



estimators involving the statistics of three or more data 
are able to reproduce qualitatively the behavior of the 
dissipation in cases where linear thermodynamics fails. 

The improvement observed when using the plug-in es- 
timators of higher order than d% is more dramatic in 
a NESS which does not exhibit observable currents in 
X. In this case = but using higher order statis- 
tics we can still detect the time irreversibility of the 
trajectory [9]. This happens for example if we add 
to the flashing ratchet an external force F opposite to 
the current, i.e., pointing in the positive x— direction. 
The force modifies the energy landscape and conse- 
quently the spatial transition rates k(xxy)^f(x 2 ,y) by a fac- 
tor exp[/3FL( ai) j,).( aail ,)/2], L (xuy) . {x2 ^ y) being the spa- 
tial distance that separates the two points (x\,y) and 
(x2,y)- Here Lr XltV yr X2t y-\ 1S defined positive if the jump 
(xi, y) — > (#2) y) points in the same direction as the force 
(i.e. to the right), and negative otherwise. At the stalling 
force install, the current is canceled by the force and the 
system does not move on average when it is described 
only by X, but still dissipates energy. If we only have ac- 
cess to the information of the position, the system looks 
like it is in equilibrium: the spatial current vanishes, and 
so does d*, as shown in Fig. [7] However, there is a fi- 
nite dissipation (black line in the figure) and the corre- 
sponding irreversibility is captured by the statistics of 
substrings of length 3 or more. Although d x is below 
the real dissipation by an order of magnitude (see the 
semi-analytical value of d x , yellow crosses in Fig. [7]) , it 
does not exhibit any sensible change at stall force. Fi- 
nally, both d^ and d* provide estimates of d x which are 
correct within one order of magnitude (see the inset of 
Fig. §. 



VI. CONCLUSIONS 

We have shown that it is possible to estimate the en- 
tropy production rate by analyzing statistical properties 
of a time series observed in a NESS. The Kullback-Leibler 
divergence (KLD) per data between the time series and 
its time reversed is a lower bound to the entropy produc- 
tion rate. 

We have introduced two estimators of this KLD rate, 
one based on empirical frequencies and another on com- 
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FIG. 7: Average dissipation per data (in units of kT) in the 
flashing ratchet (with r = 2, and j3V = 2) and different esti- 
mations of d x obtained from a single time series of n = 10 7 
data containing partial information (position) as a function 
of the external force F: analytical value of the average dissi- 
pation (black line), d* (blue circles, analytical values in blue 
dashed line), d* (green diamonds), d* (red squares), semi- 
analytical calculation of d x (yellow crosses) and (orange 
hexagons). The minimum in d* corresponds to the stalling 
force. Inset: d*, semi-analytical value of d x and d^ as a 
function of the external force. 



pression algorithms, and we have checked their perfor- 
mance in a specific example: a discrete flashing ratchet. 
We show that the KLD is a powerful tool to identify 
nonequilibrium states and to estimate the entropy pro- 
duction of a process, if this entropy production is of order 
of the Boltzmann constant. We have also shown that the 
bound given by the KLD can detect a non-zero dissipa- 
tion even when the data does not exhibit any measurable 
flows. 

Let us summarize our results by presenting a "recipe" 
to estimate the KLD from an experimental time series 
recorded from a discrete system in a NESS. If the number 
of possible states of the system is small enough, the best 
approach is to calculate the plug-in estimators (28) 



and then check the convergence when m increases. The 
possible lack of statistics can be circ umven ted using a 
small artificial bias, as discussed in Sec 
rates for some value m 



IV A 



satu- 

then the time series is an m*-th 



If dl 



order Markov process and tf* = Otherwise, we can 
use the ansatz ( 30 ) and obtain which is a good esti- 



mate of the KLD rate. 

A second and complementary approach is the use of 
the compression estimator introduced in Sec. |IV B| The 
estimator yields correct results in the examples that we 
have analyzed, but there is no clue about the correspond- 
ing error. Nevertheless, the compression estimator could 
be the only possible approach if the number of states of 
the time series is large. In this case, the calculation of 
empirical probability distributions p(x™) would be un- 
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feasible even for short substrings. 

Another possible strategy for systems with many states 
(or described by real-valued observables) is to consider 
time asymmetric functionals of the data, which reduce 
the number of observables, and hence the number of 
states, but keep information about the irreversibility of 
the series. In any case, the estimation of KLD and the 
extension of our results to processes described by con- 
tinuous data is an open problem, which will be relevant 
in many practical situations, especially to analyze data 
coming from biological systems. 

Finally, let us mention that, as in the case of Lan- 
dauer's principle, the KLD could also be used to ascer- 
tain the minimal entropy production associated with a 
specific behavior, such as spatiotemporal patterns, ex- 
citable systems, etc. This in turn may influence the de- 
sign of optimal devices with functionalities given by these 
behaviors. 



energies in spin glasses [J5]. For our specific example, the 
trick is given by the following expression: 



(logTrT(xf)) = lim — log([TrT(:c™)n 
a->o da 



(A3) 



Reference [55] shows how to apply this technique when 
T(x™) is equal to a product of random matrices which 
are chosen following a Markovian process. In our case, 
an underlying Markovian process defined by two random 
variables, X and Y, defines the order of the matrices 
that are multiplied in T(x™). We now apply the tech- 
nique described in [39] to calculate h x . If we define the 
generalized Lyapunov exponent of degree a [38j as 



L x = lim -log([TrT«)n, (A4) 

m— yog ]Jl 
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,x 



lim —L 

a->o da 



x 



(A5) 



Now we consider the following property: Given a matrix 
A and a positive integer a, (TrA) Q = Tr(A® a ), where 
A® Q = A ® A ® • • • O A. Using this property, the aver- 



age in Eq. ( A4 ) reads 



Appendix A: Calculation of the KLD rate for hidden 
Markov chains using replica trick 

The semi-analytical calculation of the KLD rate for 
a specific case of hidden Markov chains was discussed 
in Sec. |III| We now introduce another technique to cal- 
culate Eq. (17) using a mathematical technique called 
replica trick. To this end, we first consider the expres- 
sion of d x in terms of Shannon and cross entropy rates, 
d x = h x — h x . We define the matrix resulting from 
the multiplication of m transition matrices [defined in 
Eq. ( 20 )] chosen according to x™ by 



t«) = n ^(xi,x i+1 ). 



(Al) 



([TrT(a;f )] Q ) = (Tr[T(x?)® a }) = Tr(T(x™)® a ) . 

(A6) 

Since the tensor power of a product of matrices factorizes, 



(ABC)® Q = A® Q B® a C® a , Eq. (A6) can be rewritten 



([TrT«)] a )=Tr £ ]J T(x« i, + i)„ +1 T(i„ x l+1 )® a . 

x™,y™ »=1 

We now define a block matrix T(a), where each block is 
a transition matrix T(xi, X2)® a+1 ■ The matrix elements 
of T(a) are therefore: 



Shannon entropy rate h can be rewritten by susbtitu- 



ing (Al) into Eq. (21), 



,x 



lim — (logTrT(x™)) 

m— >oo ui 



(A2) 



The analytical calculation of the average (log TrT(x™)) 
is cumbersome and it can only be done semi-analytically, 
as we explained in Sec. |III| However, we can express this 
average in terms of (TrT(x™)), which can be calculated 
analytically. The mathematical technique to do this is 
called replica trick and it was introduced to calculate free 



T(a) XuyuX2 , y2 = [T( Xl ,x 2 )® a+1 ] yuy2 . (A8) 



Using (K7\ and ( |A8| in ( |A4| , we see that L x is dom- 
inated by the largest eigenvalue of T(a) which we call 

r(a), 

L x = lim -logTr[r(a) m - 1 ]=logr(a), (A9) 



yielding, 



h = — lim — logr(a) 
a^o da 



(A10) 
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The above limit cannot be calculated analytically be- 
cause the tensor powers in T(a) are only defined for in- 
teger values of a. Therefore we approximate the limit 
a — > by an estimation of the slope of as a function 
of a close to a = 0, which is given by [SS] 



h x = 2L?- I f = 21ogr(l) - (All) 

We obt ain a n equivalent result for by replacing T(x" 1 ) 
in Eq. (A2) by the product of transition matrices but 
ordered according to the time- reversed series x x m , T{x] n ). 
Defining the following matrix 



Tr{a)x u v u x2,v3 = [T(a; 2 ,a:i) T <g> T(xi,x 2 )® a ] Vl ,y 2 , 

(A12) 

and being T r (a) the largest eigenvalue of T r {a), we get 



h* = - lim — logrJa). 
a->o da 



(A13) 



In practice, we also need to approximate the limit a — > 



in the above equation using Eq. (All ) but replacing r by 



/i* =21ogT r (l)- 



logT r (2) 



(A14) 



Finally, the estimation of d x for this kind of series us- 
ing replica trick, which is shown in Fig. [6] (green dotted 
line), is obtained with the difference between Eqs. (A14) 
and (lAllj), 



d A = h* - = 2 log 



A" 



7V(1) 

r(l) 



1 , r{2) 
2 l0g ^)- 



(A15) 
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